Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
1.
Virus Evol ; 10(1): veae015, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38510920

RESUMO

We investigated transmission dynamics of a large human immunodeficiency virus (HIV) outbreak among persons who inject drugs (PWID) in KY and OH during 2017-20 by using detailed phylogenetic, network, recombination, and cluster dating analyses. Using polymerase (pol) sequences from 193 people associated with the investigation, we document high HIV-1 diversity, including Subtype B (44.6 per cent); numerous circulating recombinant forms (CRFs) including CRF02_AG (2.5 per cent) and CRF02_AG-like (21.8 per cent); and many unique recombinant forms composed of CRFs with major subtypes and sub-subtypes [CRF02_AG/B (24.3 per cent), B/CRF02_AG/B (0.5 per cent), and A6/D/B (6.4 per cent)]. Cluster analysis of sequences using a 1.5 per cent genetic distance identified thirteen clusters, including a seventy-five-member cluster composed of CRF02_AG-like and CRF02_AG/B, an eighteen-member CRF02_AG/B cluster, Subtype B clusters of sizes ranging from two to twenty-three, and a nine-member A6/D and A6/D/B cluster. Recombination and phylogenetic analyses identified CRF02_AG/B variants with ten unique breakpoints likely originating from Subtype B and CRF02_AG-like viruses in the largest clusters. The addition of contact tracing results from OH to the genetic networks identified linkage between persons with Subtype B, CRF02_AG, and CRF02_AG/B sequences in the clusters supporting de novo recombinant generation. Superinfection prevalence was 13.3 per cent (8/60) in persons with multiple specimens and included infection with B and CRF02_AG; B and CRF02_AG/B; or B and A6/D/B. In addition to the presence of multiple, distinct molecular clusters associated with this outbreak, cluster dating inferred transmission associated with the largest molecular cluster occurred as early as 2006, with high transmission rates during 2017-8 in certain other molecular clusters. This outbreak among PWID in KY and OH was likely driven by rapid transmission of multiple HIV-1 variants including de novo viral recombinants from circulating viruses within the community. Our findings documenting the high HIV-1 transmission rate and clustering through partner services and molecular clusters emphasize the importance of leveraging multiple different data sources and analyses, including those from disease intervention specialist investigations, to better understand outbreak dynamics and interrupt HIV spread.

2.
Viruses ; 15(11)2023 Nov 02.
Artigo em Inglês | MEDLINE | ID: mdl-38005885

RESUMO

Hantaviruses zoonotically infect humans worldwide with pathogenic consequences and are mainly spread by rodents that shed aerosolized virus particles in urine and feces. Bioinformatics methods for hantavirus diagnostics, genomic surveillance and epidemiology are currently lacking a comprehensive approach for data sharing, integration, visualization, analytics and reporting. With the possibility of hantavirus cases going undetected and spreading over international borders, a significant reporting delay can miss linked transmission events and impedes timely, targeted public health interventions. To overcome these challenges, we built HantaNet, a standalone visualization engine for hantavirus genomes that facilitates viral surveillance and classification for early outbreak detection and response. HantaNet is powered by MicrobeTrace, a browser-based multitool originally developed at the Centers for Disease Control and Prevention (CDC) to visualize HIV clusters and transmission networks. HantaNet integrates coding gene sequences and standardized metadata from hantavirus reference genomes into three separate gene modules for dashboard visualization of phylogenetic trees, viral strain clusters for classification, epidemiological networks and spatiotemporal analysis. We used 85 hantavirus reference datasets from GenBank to validate HantaNet as a classification and enhanced visualization tool, and as a public repository to download standardized sequence data and metadata for building analytic datasets. HantaNet is a model on how to deploy MicrobeTrace-specific tools to advance pathogen surveillance, epidemiology and public health globally.


Assuntos
Doenças Transmissíveis , Infecções por Hantavirus , Orthohantavírus , Animais , Humanos , Orthohantavírus/genética , Filogenia , Infecções por Hantavirus/epidemiologia , Doenças Transmissíveis/epidemiologia , Surtos de Doenças , Genômica , Roedores
3.
Microbiol Spectr ; 10(2): e0256421, 2022 04 27.
Artigo em Inglês | MEDLINE | ID: mdl-35234489

RESUMO

Next-generation sequencing (NGS) is a powerful tool for detecting and investigating viral pathogens; however, analysis and management of the enormous amounts of data generated from these technologies remains a challenge. Here, we present VPipe (the Viral NGS Analysis Pipeline and Data Management System), an automated bioinformatics pipeline optimized for whole-genome assembly of viral sequences and identification of diverse species. VPipe automates the data quality control, assembly, and contig identification steps typically performed when analyzing NGS data. Users access the pipeline through a secure web-based portal, which provides an easy-to-use interface with advanced search capabilities for reviewing results. In addition, VPipe provides a centralized system for storing and analyzing NGS data, eliminating common bottlenecks in bioinformatics analyses for public health laboratories with limited on-site computational infrastructure. The performance of VPipe was validated through the analysis of publicly available NGS data sets for viral pathogens, generating high-quality assemblies for 12 data sets. VPipe also generated assemblies with greater contiguity than similar pipelines for 41 human respiratory syncytial virus isolates and 23 SARS-CoV-2 specimens. IMPORTANCE Computational infrastructure and bioinformatics analysis are bottlenecks in the application of NGS to viral pathogens. As of September 2021, VPipe has been used by the U.S. Centers for Disease Control and Prevention (CDC) and 12 state public health laboratories to characterize >17,500 and 1,500 clinical specimens and isolates, respectively. VPipe automates genome assembly for a wide range of viruses, including high-consequence pathogens such as SARS-CoV-2. Such automated functionality expedites public health responses to viral outbreaks and pathogen surveillance.


Assuntos
COVID-19 , Vírus , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , SARS-CoV-2/genética , Vírus/genética
4.
Front Genet ; 11: 601870, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33324449

RESUMO

Effective laboratory-based surveillance and public health response to bacterial meningitis depends on timely characterization of bacterial meningitis pathogens. Traditionally, characterizing bacterial meningitis pathogens such as Neisseria meningitidis (Nm) and Haemophilus influenzae (Hi) required several biochemical and molecular tests. Whole genome sequencing (WGS) has enabled the development of pipelines capable of characterizing the given pathogen with equivalent results to many of the traditional tests. Here, we present the Bacterial Meningitis Genomic Analysis Platform (BMGAP): a secure, web-accessible informatics platform that facilitates automated analysis of WGS data in public health laboratories. BMGAP is a pipeline comprised of several components, including both widely used, open-source third-party software and customized analysis modules for the specific target pathogens. BMGAP performs de novo draft genome assembly and identifies the bacterial species by whole-genome comparisons against a curated reference collection of 17 focal species including Nm, Hi, and other closely related species. Genomes identified as Nm or Hi undergo multi-locus sequence typing (MLST) and capsule characterization. Further typing information is captured from Nm genomes, such as peptides for the vaccine antigens FHbp, NadA, and NhbA. Assembled genomes are retained in the BMGAP database, serving as a repository for genomic comparisons. BMGAP's species identification and capsule characterization modules were validated using PCR and slide agglutination from 446 bacterial invasive isolates (273 Nm from nine different serogroups, 150 Hi from seven different serotypes, and 23 from nine other species) collected from 2017 to 2019 through surveillance programs. Among the validation isolates, BMGAP correctly identified the species for all 440 isolates (100% sensitivity and specificity) and accurately characterized all Nm serogroups (99% sensitivity and 98% specificity) and Hi serotypes (100% sensitivity and specificity). BMGAP provides an automated, multi-species analysis pipeline that can be extended to include additional analysis modules as needed. This provides easy-to-interpret and validated Nm and Hi genome analysis capacity to public health laboratories and collaborators. As the BMGAP database accumulates more genomic data, it grows as a valuable resource for rapid comparative genomic analyses during outbreak investigations.

5.
BMC Syst Biol ; 8: 93, 2014 Aug 13.
Artigo em Inglês | MEDLINE | ID: mdl-25115450

RESUMO

BACKGROUND: Toxicogenomics studies often profile gene expression from assays involving multiple doses and time points. The dose- and time-dependent pattern is of great importance to assess toxicity but computational approaches are lacking to effectively utilize this characteristic in toxicity assessment. Topic modeling is a text mining approach, but may be used analogously in toxicogenomics due to the similar data structures between text and gene dysregulation. RESULTS: Topic modeling was applied to a very large toxicogenomics dataset containing microarray gene expression data from >15,000 samples associated with 131 drugs tested in three different assay platforms (i.e., in vitro assay, in vivo repeated dose study and in vivo single dose experiment) with a design including multiple doses and time points. A set of "topics" which each consist of a set of genes was determined, by which the varying sensitivity of three assay systems was observed. We found that the drug-dependent effect was more pronounced in the two in vivo systems than the in vitro system, while the time-dependent effect was most strongly reflected in the in vitro system followed by the single dose study and lastly the repeated dose experiment. The dose-dependent effect was similar across three assay systems. Although the results indicated a challenge to extrapolate the in vitro results to the in vivo situation, we did notice that, for some drugs but not for all the drugs, the similarity in gene expression patterns was observed across all three assay systems, indicating a possibility of using in vitro systems with careful designs (such as the choice of dose and time point), to replace the in vivo testing strategy. Nonetheless, a potential to replace the repeated dose study by the single-dose short-term methodology was strongly implied. CONCLUSIONS: The study demonstrated that text mining methodologies such as topic modeling provide an alternative method compared to traditional means for data reduction in toxicogenomics, enhancing researchers' capabilities to interpret biological information.


Assuntos
Biologia Computacional/métodos , Mineração de Dados/métodos , Toxicogenética/métodos , Relação Dose-Resposta a Droga , Humanos , Análise de Sequência com Séries de Oligonucleotídeos , Fatores de Tempo
6.
BMC Bioinformatics ; 15: 267, 2014 Aug 08.
Artigo em Inglês | MEDLINE | ID: mdl-25103881

RESUMO

BACKGROUND: The phenome represents a distinct set of information in the human population. It has been explored particularly in its relationship with the genome to identify correlations for diseases. The phenome has been also explored for drug repositioning with efforts focusing on the search space for the most similar candidate drugs. For a comprehensive analysis of the phenome, we assumed that all phenotypes (indications and side effects) were inter-connected with a probabilistic distribution and this characteristic may offer an opportunity to identify new therapeutic indications for a given drug. Correspondingly, we employed Latent Dirichlet Allocation (LDA), which introduces latent variables (topics) to govern the phenome distribution. RESULTS: We developed our model on the phenome information in Side Effect Resource (SIDER). We first developed a LDA model optimized based on its recovery potential through perturbing the drug-phenotype matrix for each of the drug-indication pairs where each drug-indication relationship was switched to "unknown" one at the time and then recovered based on the remaining drug-phenotype pairs. Of the probabilistically significant pairs, 70% was successfully recovered. Next, we applied the model on the whole phenome to narrow down repositioning candidates and suggest alternative indications. We were able to retrieve approved indications of 6 drugs whose indications were not listed in SIDER. For 908 drugs that were present with their indication information, our model suggested alternative treatment options for further investigations. Several of the suggested new uses can be supported with information from the scientific literature. CONCLUSIONS: The results demonstrated that the phenome can be further analyzed by a generative model, which can discover probabilistic associations between drugs and therapeutic uses. In this regard, LDA serves as an enrichment tool to explore new uses of existing drugs by narrowing down the search space.


Assuntos
Biologia Computacional/métodos , Reposicionamento de Medicamentos/métodos , Modelos Estatísticos , Fenótipo , Mineração de Dados , Bases de Dados de Produtos Farmacêuticos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos
7.
BMC Bioinformatics ; 14 Suppl 14: S11, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24267543

RESUMO

BACKGROUND: High Content Screening (HCS) has become an important tool for toxicity assessment, partly due to its advantage of handling multiple measurements simultaneously. This approach has provided insight and contributed to the understanding of systems biology at cellular level. To fully realize this potential, the simultaneously measured multiple endpoints from a live cell should be considered in a probabilistic relationship to assess the cell's condition to response stress from a treatment, which poses a great challenge to extract hidden knowledge and relationships from these measurements. METHOD: In this work, we applied a text mining method of Latent Dirichlet Allocation (LDA) to analyze cellular endpoints from in vitro HCS assays and related to the findings to in vivo histopathological observations. We measured multiple HCS assay endpoints for 122 drugs. Since LDA requires the data to be represented in document-term format, we first converted the continuous value of the measurements to the word frequency that can processed by the text mining tool. For each of the drugs, we generated a document for each of the 4 time points. Thus, we ended with 488 documents (drug-hour) each having different values for the 10 endpoints which are treated as words. We extracted three topics using LDA and examined these to identify diagnostic topics for 45 common drugs located in vivo experiments from the Japanese Toxicogenomics Project (TGP) observing their necrosis findings at 6 and 24 hours after treatment. RESULTS: We found that assay endpoints assigned to particular topics were in concordance with the histopathology observed. Drugs showing necrosis at 6 hour were linked to severe damage events such as Steatosis, DNA Fragmentation, Mitochondrial Potential, and Lysosome Mass. DNA Damage and Apoptosis were associated with drugs causing necrosis at 24 hours, suggesting an interplay of the two pathways in these drugs. Drugs with no sign of necrosis we related to the Cell Loss and Nuclear Size assays, which is suggestive of hepatocyte regeneration. CONCLUSIONS: The evidence from this study suggests that topic modeling with LDA can enable us to interpret relationships of endpoints of in vitro assays along with an in vivo histological finding, necrosis. Effectiveness of this approach may add substantially to our understanding of systems biology.


Assuntos
Mineração de Dados , Toxicogenética/métodos , Animais , Apoptose/efeitos dos fármacos , Células Cultivadas , Dano ao DNA , Bases de Dados Genéticas , Hepatócitos/efeitos dos fármacos , Hepatócitos/metabolismo , Ensaios de Triagem em Larga Escala , Lisossomos/metabolismo , Masculino , Mitocôndrias/efeitos dos fármacos , Mitocôndrias/genética , Mitocôndrias/metabolismo , Necrose/genética , Necrose/metabolismo , Ratos , Ratos Sprague-Dawley
8.
Toxicol Sci ; 136(1): 242-9, 2013 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-23997115

RESUMO

Drug-induced liver injury (DILI) is one of the leading causes of the termination of drug development programs. Consequently, identifying the risk of DILI in humans for drug candidates during the early stages of the development process would greatly reduce the drug attrition rate in the pharmaceutical industry but would require the implementation of new research and development strategies. In this regard, several in silico models have been proposed as alternative means in prioritizing drug candidates. Because the accuracy and utility of a predictive model rests largely on how to annotate the potential of a drug to cause DILI in a reliable and consistent way, the Food and Drug Administration-approved drug labeling was given prominence. Out of 387 drugs annotated, 197 drugs were used to develop a quantitative structure-activity relationship (QSAR) model and the model was subsequently challenged by the left of drugs serving as an external validation set with an overall prediction accuracy of 68.9%. The performance of the model was further assessed by the use of 2 additional independent validation sets, and the 3 validation data sets have a total of 483 unique drugs. We observed that the QSAR model's performance varied for drugs with different therapeutic uses; however, it achieved a better estimated accuracy (73.6%) as well as negative predictive value (77.0%) when focusing only on these therapeutic categories with high prediction confidence. Thus, the model's applicability domain was defined. Taken collectively, the developed QSAR model has the potential utility to prioritize compound's risk for DILI in humans, particularly for the high-confidence therapeutic subgroups like analgesics, antibacterial agents, and antihistamines.


Assuntos
Doença Hepática Induzida por Substâncias e Drogas/etiologia , Aprovação de Drogas , Rotulagem de Medicamentos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/etiologia , Preparações Farmacêuticas/química , United States Food and Drug Administration , Simulação por Computador , Humanos , Modelos Moleculares , Estrutura Molecular , Preparações Farmacêuticas/classificação , Relação Quantitativa Estrutura-Atividade , Reprodutibilidade dos Testes , Medição de Risco , Fatores de Risco , Estados Unidos
9.
Am J Pathol ; 182(4): 1180-7, 2013 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-23395088

RESUMO

Drug-induced liver injury (DILI) may present any morphologic characteristic of acute or chronic liver disease with no standardized terminology in place. Defining lexemes of DILI histopathology would allow the development of advanced knowledge discovery and data mining tools for across comparisons of publicly available information. For these purposes, a DILI ontology (DILIo) was developed by using the Unified Medical Language System tool and the standardized terminology of the Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT). The DILIo was entrained on findings of 114 US Food and Drug Administration-approved drugs by extracting all clinically DILI-related histopathologic descriptions for 1082 liver biopsy samples, which were then analyzed using the Unified Medical Language System MetaMap and subsequently mapped to the SNOMED CT. The DILIo provides a standard means to describe and organize liver injury induced by drugs, enabling comparative analysis of drugs within and across histopathologic terms. The analysis showed that flutamide, troglitazone, diclofenac, isoniazid, and tamoxifen were reported to have the most diverse histopathologic observations in liver biopsy. Necrosis, cholestasis, fatty degeneration, fibrosis, infiltrate, and hepatic necrosis were the most frequent terms used as descriptors of histopathologic features of DILI. In conclusion, DILIo entrains different algorithms for an efficient meta-analysis of published findings for an improved understanding of mechanisms and clinical characteristics of DILI.


Assuntos
Doença Hepática Induzida por Substâncias e Drogas/patologia , Terminologia como Assunto , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Fígado/patologia , Publicações , Tioguanina/efeitos adversos
10.
Hum Genomics ; 6: 5, 2012 Jul 05.
Artigo em Inglês | MEDLINE | ID: mdl-23245293

RESUMO

A genetic association study is a complicated process that involves collecting phenotypic data, generating genotypic data, analyzing associations between genotypic and phenotypic data, and interpreting genetic biomarkers identified. SNPTrack is an integrated bioinformatics system developed by the US Food and Drug Administration (FDA) to support the review and analysis of pharmacogenetics data resulting from FDA research or submitted by sponsors. The system integrates data management, analysis, and interpretation in a single platform for genetic association studies. Specifically, it stores genotyping data and single-nucleotide polymorphism (SNP) annotations along with study design data in an Oracle database. It also integrates popular genetic analysis tools, such as PLINK and Haploview. SNPTrack provides genetic analysis capabilities and captures analysis results in its database as SNP lists that can be cross-linked for biological interpretation to gene/protein annotations, Gene Ontology, and pathway analysis data. With SNPTrack, users can do the entire stream of bioinformatics jobs for genetic association studies. SNPTrack is freely available to the public at http://www.fda.gov/ScienceResearch/BioinformaticsTools/SNPTrack/default.htm.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Polimorfismo de Nucleotídeo Único , Ontologia Genética , Estudos de Associação Genética/métodos , Predisposição Genética para Doença/genética , Genótipo , Humanos , Internet , Fenótipo , Transdução de Sinais/genética , Software
11.
BMC Bioinformatics ; 13 Suppl 15: S6, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23046522

RESUMO

BACKGROUND: Drug repositioning offers an opportunity to revitalize the slowing drug discovery pipeline by finding new uses for currently existing drugs. Our hypothesis is that drugs sharing similar side effect profiles are likely to be effective for the same disease, and thus repositioning opportunities can be identified by finding drug pairs with similar side effects documented in U.S. Food and Drug Administration (FDA) approved drug labels. The safety information in the drug labels is usually obtained in the clinical trial and augmented with the observations in the post-market use of the drug. Therefore, our drug repositioning approach can take the advantage of more comprehensive safety information comparing with conventional de novo approach. METHOD: A probabilistic topic model was constructed based on the terms in the Medical Dictionary for Regulatory Activities (MedDRA) that appeared in the Boxed Warning, Warnings and Precautions, and Adverse Reactions sections of the labels of 870 drugs. Fifty-two unique topics, each containing a set of terms, were identified by using topic modeling. The resulting probabilistic topic associations were used to measure the distance (similarity) between drugs. The success of the proposed model was evaluated by comparing a drug and its nearest neighbor (i.e., a drug pair) for common indications found in the Indications and Usage Section of the drug labels. RESULTS: Given a drug with more than three indications, the model yielded a 75% recall, meaning 75% of drug pairs shared one or more common indications. This is significantly higher than the 22% recall rate achieved by random selection. Additionally, the recall rate grows rapidly as the number of drug indications increases and reaches 84% for drugs with 11 indications. The analysis also demonstrated that 65 drugs with a Boxed Warning, which indicates significant risk of serious and possibly life-threatening adverse effects, might be replaced with safer alternatives that do not have a Boxed Warning. In addition, we identified two therapeutic groups of drugs (Musculo-skeletal system and Anti-infective for systemic use) where over 80% of the drugs have a potential replacement with high significance. CONCLUSION: Topic modeling can be a powerful tool for the identification of repositioning opportunities by examining the adverse event terms in FDA approved drug labels. The proposed framework not only suggests drugs that can be repurposed, but also provides insight into the safety of repositioned drugs.


Assuntos
Reposicionamento de Medicamentos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Modelos Teóricos , Rotulagem de Medicamentos , Estados Unidos , United States Food and Drug Administration
12.
BMC Genomics ; 13: 325, 2012 Jul 20.
Artigo em Inglês | MEDLINE | ID: mdl-22817640

RESUMO

BACKGROUND: Large amounts of mammalian protein-protein interaction (PPI) data have been generated and are available for public use. From a systems biology perspective, Proteins/genes interactions encode the key mechanisms distinguishing disease and health, and such mechanisms can be uncovered through network analysis. An effective network analysis tool should integrate different content-specific PPI databases into a comprehensive network format with a user-friendly platform to identify key functional modules/pathways and the underlying mechanisms of disease and toxicity. RESULTS: atBioNet integrates seven publicly available PPI databases into a network-specific knowledge base. Knowledge expansion is achieved by expanding a user supplied proteins/genes list with interactions from its integrated PPI network. The statistically significant functional modules are determined by applying a fast network-clustering algorithm (SCAN: a Structural Clustering Algorithm for Networks). The functional modules can be visualized either separately or together in the context of the whole network. Integration of pathway information enables enrichment analysis and assessment of the biological function of modules. Three case studies are presented using publicly available disease gene signatures as a basis to discover new biomarkers for acute leukemia, systemic lupus erythematosus, and breast cancer. The results demonstrated that atBioNet can not only identify functional modules and pathways related to the studied diseases, but this information can also be used to hypothesize novel biomarkers for future analysis. CONCLUSION: atBioNet is a free web-based network analysis tool that provides a systematic insight into proteins/genes interactions through examining significant functional modules. The identified functional modules are useful for determining underlying mechanisms of disease and biomarker discovery. It can be accessed at: http://www.fda.gov/ScienceResearch/BioinformaticsTools/ucm285284.htm.


Assuntos
Biomarcadores/metabolismo , Genômica , Software , Algoritmos , Análise por Conglomerados , Bases de Dados de Proteínas , Humanos , Redes e Vias Metabólicas , Mapas de Interação de Proteínas , Interface Usuário-Computador
13.
BMC Bioinformatics ; 12 Suppl 10: S3, 2011 Oct 18.
Artigo em Inglês | MEDLINE | ID: mdl-22166133

RESUMO

BACKGROUND: Genomic biomarkers play an increasing role in both preclinical and clinical application. Development of genomic biomarkers with microarrays is an area of intensive investigation. However, despite sustained and continuing effort, developing microarray-based predictive models (i.e., genomics biomarkers) capable of reliable prediction for an observed or measured outcome (i.e., endpoint) of unknown samples in preclinical and clinical practice remains a considerable challenge. No straightforward guidelines exist for selecting a single model that will perform best when presented with unknown samples. In the second phase of the MicroArray Quality Control (MAQC-II) project, 36 analysis teams produced a large number of models for 13 preclinical and clinical endpoints. Before external validation was performed, each team nominated one model per endpoint (referred to here as 'nominated models') from which MAQC-II experts selected 13 'candidate models' to represent the best model for each endpoint. Both the nominated and candidate models from MAQC-II provide benchmarks to assess other methodologies for developing microarray-based predictive models. METHODS: We developed a simple ensemble method by taking a number of the top performing models from cross-validation and developing an ensemble model for each of the MAQC-II endpoints. We compared the ensemble models with both nominated and candidate models from MAQC-II using blinded external validation. RESULTS: For 10 of the 13 MAQC-II endpoints originally analyzed by the MAQC-II data analysis team from the National Center for Toxicological Research (NCTR), the ensemble models achieved equal or better predictive performance than the NCTR nominated models. Additionally, the ensemble models had performance comparable to the MAQC-II candidate models. Most ensemble models also had better performance than the nominated models generated by five other MAQC-II data analysis teams that analyzed all 13 endpoints. CONCLUSIONS: Our findings suggest that an ensemble method can often attain a higher average predictive performance in an external validation set than a corresponding "optimized" model method. Using an ensemble method to determine a final model is a potentially important supplement to the good modeling practices recommended by the MAQC-II project for developing microarray-based genomic biomarkers.


Assuntos
Modelos Genéticos , Neoplasias/genética , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Toxicogenética/métodos , Perfilação da Expressão Gênica/métodos , Humanos , Metanálise como Assunto , Controle de Qualidade
14.
PLoS Comput Biol ; 7(12): e1002310, 2011 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-22194678

RESUMO

Drug-induced liver injury (DILI) is a significant concern in drug development due to the poor concordance between preclinical and clinical findings of liver toxicity. We hypothesized that the DILI types (hepatotoxic side effects) seen in the clinic can be translated into the development of predictive in silico models for use in the drug discovery phase. We identified 13 hepatotoxic side effects with high accuracy for classifying marketed drugs for their DILI potential. We then developed in silico predictive models for each of these 13 side effects, which were further combined to construct a DILI prediction system (DILIps). The DILIps yielded 60-70% prediction accuracy for three independent validation sets. To enhance the confidence for identification of drugs that cause severe DILI in humans, the "Rule of Three" was developed in DILIps by using a consensus strategy based on 13 models. This gave high positive predictive value (91%) when applied to an external dataset containing 206 drugs from three independent literature datasets. Using the DILIps, we screened all the drugs in DrugBank and investigated their DILI potential in terms of protein targets and therapeutic categories through network modeling. We demonstrated that two therapeutic categories, anti-infectives for systemic use and musculoskeletal system drugs, were enriched for DILI, which is consistent with current knowledge. We also identified protein targets and pathways that are related to drugs that cause DILI by using pathway analysis and co-occurrence text mining. While marketed drugs were the focus of this study, the DILIps has a potential as an evaluation tool to screen and prioritize new drug candidates or chemicals, such as environmental chemicals, to avoid those that might cause liver toxicity. We expect that the methodology can be also applied to other drug safety endpoints, such as renal or cardiovascular toxicity.


Assuntos
Doença Hepática Induzida por Substâncias e Drogas/metabolismo , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Modelos Biológicos , Animais , Anti-Infecciosos/efeitos adversos , Anti-Inflamatórios/efeitos adversos , Bases de Dados Factuais , Humanos , Fígado/efeitos dos fármacos , Valor Preditivo dos Testes
15.
Chem Res Toxicol ; 24(7): 1062-70, 2011 Jul 18.
Artigo em Inglês | MEDLINE | ID: mdl-21627106

RESUMO

The primary testing strategy to identify nongenotoxic carcinogens largely relies on the 2-year rodent bioassay, which is time-consuming and labor-intensive. There is an increasing effort to develop alternative approaches to prioritize the chemicals for, supplement, or even replace the cancer bioassay. In silico approaches based on quantitative structure-activity relationships (QSAR) are rapid and inexpensive and thus have been investigated for such purposes. A slightly more expensive approach based on short-term animal studies with toxicogenomics (TGx) represents another attractive option for this application. Thus, the primary questions are how much better predictive performance using short-term TGx models can be achieved compared to that of QSAR models, and what length of exposure is sufficient for high quality prediction based on TGx. In this study, we developed predictive models for rodent liver carcinogenicity using gene expression data generated from short-term animal models at different time points and QSAR. The study was focused on the prediction of nongenotoxic carcinogenicity since the genotoxic chemicals can be inexpensively removed from further development using various in vitro assays individually or in combination. We identified 62 chemicals whose hepatocarcinogenic potential was available from the National Center for Toxicological Research liver cancer database (NCTRlcdb). The gene expression profiles of liver tissue obtained from rats treated with these chemicals at different time points (1 day, 3 days, and 5 days) are available from the Gene Expression Omnibus (GEO) database. Both TGx and QSAR models were developed on the basis of the same set of chemicals using the same modeling approach, a nearest-centroid method with a minimum redundancy and maximum relevancy-based feature selection with performance assessed using compound-based 5-fold cross-validation. We found that the TGx models outperformed QSAR in every aspect of modeling. For example, the TGx models' predictive accuracy (0.77, 0.77, and 0.82 for the 1-day, 3-day, and 5-day models, respectively) was much higher for an independent validation set than that of a QSAR model (0.55). Permutation tests confirmed the statistical significance of the model's prediction performance. The study concluded that a short-term 5-day TGx animal model holds the potential to predict nongenotoxic hepatocarcinogenicity.


Assuntos
Carcinógenos/toxicidade , Fígado/efeitos dos fármacos , Relação Quantitativa Estrutura-Atividade , Toxicogenética , Animais , Bases de Dados Factuais , Perfilação da Expressão Gênica , Camundongos , Modelos Animais , Ratos , Software , Fatores de Tempo , Testes de Toxicidade
16.
Clin Transl Sci ; 4(1): 17-23, 2011 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-21348951

RESUMO

A three-stage approach was undertaken using genome-wide, case-control, and case-only association studies to identify genetic variants associated with heart failure mortality. In an Amish founder population (n = 851), cardiac hypertrophy, a trait integral to the adaptive response to failure, was found to be heritable (h² = 0.28, p = 0.0002) and GWAS revealed 21 candidate hypertrophy SNPs. In a case (n = 1,610)-control (n = 463) study in unrelated Caucasians, one of the SNPs associated with hypertrophy (rs2207418, p = 8 × 10⁻6), was associated with heart failure, RR = 1.85(1.25-2.73, p = 0.0019). In heart failure cases rs2207418 was associated with increased mortality, HR = 1.51(1.20-1.97, p = 0.0004). There was consistency between studies, with the GG allele being associated with increased ventricular mass (~13 g/m²) in the Amish, heart failure risk, and heart failure mortality. This SNP is in a gene desert of chromosome 20p12. Five genes are within 2.0 mbp of rs2207418 but with low LD between their SNPs and rs2207418. A region near this SNP is highly conserved in multiple vertebrates (lod score = 1,208). This conservation and the internal consistency across studies suggests that this region has biologic importance in heart failure, potentially acting as an enhancer or repressor element. rs2207418 may be useful for predicting a more progressive form of heart failure that may require aggressive therapy.


Assuntos
Cardiomegalia/complicações , Cardiomegalia/genética , Efeito Fundador , Predisposição Genética para Doença , Insuficiência Cardíaca/genética , Insuficiência Cardíaca/mortalidade , Polimorfismo de Nucleotídeo Único/efeitos dos fármacos , Adulto , Idoso , Idoso de 80 Anos ou mais , Sequência de Bases , Cardiomegalia/diagnóstico por imagem , Estudos de Coortes , Demografia , Etnicidade/genética , Feminino , Insuficiência Cardíaca/complicações , Insuficiência Cardíaca/diagnóstico por imagem , Ventrículos do Coração/patologia , Humanos , Masculino , Pessoa de Meia-Idade , Tamanho do Órgão , Polimorfismo de Nucleotídeo Único/genética , Fatores de Risco , Homologia de Sequência do Ácido Nucleico , Elementos Nucleotídeos Curtos e Dispersos/genética , Ultrassonografia , Adulto Jovem
17.
Adv Genet ; 72: 181-93, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-21029853

RESUMO

The KGraph is a data visualization system that has been developed to display the complex relationships between the univariate and bivariate associations among an outcome of interest, a set of covariates, and a set of genetic variations such as single-nucleotide polymorphisms (SNPs). It allows for easy simultaneous viewing and interpretation of genetic associations, correlations among covariates and SNPs, and information about the replication and cross-validation of these associations. The KGraph allows the user to more easily investigate multicollinearity and confounding through visualization of the multidimensional correlation structure underlying genetic associations. It emphasizes gene-environment interactions, gene-gene interactions, and correlations, all important components of the complex genetic architecture of most human traits. The KGraph was designed for use in gene-centric studies, but can be integrated into association analysis workflows as well. The software is available at http://www.epidkardia.sph.umich.edu/software/kgrapher.


Assuntos
Estudo de Associação Genômica Ampla/métodos , Software , Doença/genética , Humanos , Polimorfismo de Nucleotídeo Único , Interface Usuário-Computador
18.
Hum Genomics ; 4(6): 428-34, 2010 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-20846933

RESUMO

ArrayTrack is a Food and Drug Administration (FDA) bioinformatics tool that has been widely adopted by the research community for genomics studies. It provides an integrated environment for microarray data management, analysis and interpretation. Most of its functionality for statistical, pathway and gene ontology analysis can also be applied independently to data generated by other molecular technologies. ArrayTrack has been undergoing active development and enhancement since its inception in 2001. This review summarises its key functionalities, with emphasis on the most recent extensions in support of the evolving needs of FDA's research programmes. ArrayTrack has added capability to manage, analyse and interpret proteomics and metabolomics data after quantification of peptides and metabolites abundance, respectively. Annotation information about single nucleotide polymorphisms and quantitative trait loci has been integrated to support genetics-related studies. Other extensions have been added to manage and analyse genomics data related to bacterial food-borne pathogens.


Assuntos
Pesquisa Biomédica/métodos , Biologia Computacional/métodos , Software , United States Food and Drug Administration , Humanos , Polimorfismo de Nucleotídeo Único/genética , Estados Unidos
19.
J Am Coll Cardiol ; 54(5): 432-44, 2009 Jul 28.
Artigo em Inglês | MEDLINE | ID: mdl-19628119

RESUMO

OBJECTIVES: This study sought to identify genetic modifiers of beta-blocker response and long-term survival in heart failure (HF). BACKGROUND: Differences in beta-blocker treatment effect between Caucasians and African Americans with HF have been reported. METHODS: This was a prospective cohort study of 2,460 patients (711 African American, 1,749 Caucasian) enrolled between 1999 and 2007; 2,039 patients (81.7%) were treated with a beta-blocker. Each was genotyped for beta1-adrenergic receptor (ADRB1) Arg389>Gly and G-protein receptor kinase 5 (GRK5) Gln41>Leu polymorphisms, which are more prevalent among African Americans than Caucasians. The primary end point was survival time from HF onset. RESULTS: There were 765 deaths during follow-up (median 46 months). beta-blocker treatment increased survival in Caucasians (log-rank p = 0.00038) but not African Americans (log-rank p = 0.327). Among patients not taking beta-blockers, ADRB1 Gly389 was associated with decreased survival in Caucasians (hazard ratio [HR]: 1.98, 95% confidence interval [CI]: 1.1 to 3.7, p = 0.03) whereas GRK5 Leu41 was associated with improved survival in African Americans (HR: 0.325, CI: 0.133 to 0.796, p = 0.01). African Americans with ADRB1 Gly389Gly GRK5 Gln41Gln derived a similar survival benefit from beta-blocker therapy (HR: 0.385, 95% CI: 0.182 to 0.813, p = 0.012) as Caucasians with the same genotype (HR: 0.529, 95% CI: 0.326 to 0.858, p = 0.0098). CONCLUSIONS: These data show that differences caused by beta-adrenergic receptor signaling pathway gene polymorphisms, rather than race, are the major factors contributing to apparent differences in the beta-blocker treatment effect between Caucasians and African Americans; proper evaluation of treatment response should account for genetic variance.


Assuntos
Insuficiência Cardíaca/genética , Insuficiência Cardíaca/mortalidade , Receptores Adrenérgicos beta/genética , Antagonistas Adrenérgicos beta/farmacologia , Antagonistas Adrenérgicos beta/uso terapêutico , Negro ou Afro-Americano , Idoso , Estudos de Coortes , Feminino , Quinase 5 de Receptor Acoplado a Proteína G/genética , Genótipo , Insuficiência Cardíaca/tratamento farmacológico , Humanos , Masculino , Pessoa de Meia-Idade , Polimorfismo Genético , Estudos Prospectivos , Receptores Adrenérgicos beta/efeitos dos fármacos , Transdução de Sinais/genética , Taxa de Sobrevida , Fatores de Tempo , Resultado do Tratamento , População Branca
20.
BMC Med Genomics ; 2: 16, 2009 Apr 07.
Artigo em Inglês | MEDLINE | ID: mdl-19351393

RESUMO

BACKGROUND: Subcortical white matter hyperintensity on magnetic resonance imaging (MRI) of the brain, referred to as leukoaraiosis, is associated with increased risk of stroke and dementia. Hypertension may contribute to leukoaraiosis by accelerating the process of arteriosclerosis involving penetrating small arteries and arterioles in the brain. Leukoaraiosis volume is highly heritable but shows significant inter-individual variability that is not predicted well by any clinical covariates (except for age) or by single SNPs. METHODS: As part of the Genetics of Microangiopathic Brain Injury (GMBI) Study, 777 individuals (74% hypertensive) underwent brain MRI and were genotyped for 1649 SNPs from genes known or hypothesized to be involved in arteriosclerosis and related pathways. We examined SNP main effects, epistatic (gene-gene) interactions, and context-dependent (gene-environment) interactions between these SNPs and covariates (including conventional and novel risk factors for arteriosclerosis) for association with leukoaraiosis volume. Three methods were used to reduce the chance of false positive associations: 1) false discovery rate (FDR) adjustment for multiple testing, 2) an internal replication design, and 3) a ten-iteration four-fold cross-validation scheme. RESULTS: Four SNP main effects (in F3, KITLG, CAPN10, and MMP2), 12 SNP-covariate interactions (including interactions between KITLG and homocysteine, and between TGFB3 and both physical activity and C-reactive protein), and 173 SNP-SNP interactions were significant, replicated, and cross-validated. While a model containing the top single SNPs with main effects predicted only 3.72% of variation in leukoaraiosis in independent test samples, a multiple variable model that included the four most highly predictive SNP-SNP and SNP-covariate interactions predicted 11.83%. CONCLUSION: These results indicate that the genetic architecture of leukoaraiosis is complex, yet predictive, when the contributions of SNP main effects are considered in combination with effects of SNP interactions with other genes and covariates.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA